{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "vm8vn9t8DvC_"
   },
   "source": [
    "# Cassandra"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html)."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "5WjXERXzFEhg"
   },
   "source": [
    "## Overview"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "juAmbgoWD17u"
   },
   "source": [
    "The Cassandra Document Loader returns a list of Langchain Documents from a Cassandra database.\n",
    "\n",
    "You must either provide a CQL query or a table name to retrieve the documents.\n",
    "The Loader takes the following parameters:\n",
    "\n",
    "* table: (Optional) The table to load the data from.\n",
    "* session: (Optional) The cassandra driver session. If not provided, the cassio resolved session will be used.\n",
    "* keyspace: (Optional) The keyspace of the table. If not provided, the cassio resolved keyspace will be used.\n",
    "* query: (Optional) The query used to load the data.\n",
    "* page_content_mapper: (Optional) a function to convert a row to string page content. The default converts the row to JSON.\n",
    "* metadata_mapper: (Optional) a function to convert a row to metadata dict.\n",
    "* query_parameters: (Optional) The query parameters used when calling session.execute .\n",
    "* query_timeout: (Optional) The query timeout used when calling session.execute .\n",
    "* query_custom_payload: (Optional) The query custom_payload used when calling `session.execute`.\n",
    "* query_execution_profile: (Optional) The query execution_profile used when calling `session.execute`.\n",
    "* query_host: (Optional) The query host used when calling `session.execute`.\n",
    "* query_execute_as: (Optional) The query execute_as used when calling `session.execute`."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load documents with the Document Loader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.document_loaders import CassandraLoader"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "### Init from a cassandra driver Session\n",
    "\n",
    "You need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "from cassandra.cluster import Cluster\n",
    "\n",
    "cluster = Cluster()\n",
    "session = cluster.connect()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "You need to provide the name of an existing keyspace of the Cassandra instance:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "Creating the document loader:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-01-19T15:47:25.893037Z",
     "start_time": "2024-01-19T15:47:25.889398Z"
    }
   },
   "outputs": [],
   "source": [
    "loader = CassandraLoader(\n",
    "    table=\"movie_reviews\",\n",
    "    session=session,\n",
    "    keyspace=CASSANDRA_KEYSPACE,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-01-19T15:47:26.399472Z",
     "start_time": "2024-01-19T15:47:26.389145Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-01-19T15:47:33.287783Z",
     "start_time": "2024-01-19T15:47:33.277862Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(page_content='Row(_id=\\'659bdffa16cbc4586b11a423\\', title=\\'Dangerous Men\\', reviewtext=\\'\"Dangerous Men,\"  the picture\\\\\\'s production notes inform, took 26 years to reach the big screen. After having seen it, I wonder: What was the rush?\\')', metadata={'table': 'movie_reviews', 'keyspace': 'default_keyspace'})"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "docs[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "### Init from cassio\n",
    "\n",
    "It's also possible to use cassio to configure the session and keyspace."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "import cassio\n",
    "\n",
    "cassio.init(contact_points=\"127.0.0.1\", keyspace=CASSANDRA_KEYSPACE)\n",
    "\n",
    "loader = CassandraLoader(\n",
    "    table=\"movie_reviews\",\n",
    ")\n",
    "\n",
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Attribution statement\n",
    "\n",
    "> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [
    "5WjXERXzFEhg"
   ],
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
